9 research outputs found

    A Roadmap for HEP Software and Computing R&D for the 2020s

    Get PDF
    Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe

    Computing in High Energy and Nuclear Physics (CHEP) 2012

    No full text
    This contribution describes a prototype grid proxy cache system developed at Nikhef, motivated by a desire to construct the first building block of a future https-based Content Delivery Network for multiple-VO grid infrastructures. Two goals drove the project: firstly to provide a "native view" of the grid for desktop-type users, and secondly to improve performance for physics-analysis type use cases, where multiple passes are made over the same set of data (residing on the grid). We further constrained the design by requiring that the system should be made of standard components wherever possible. The prototype that emerged from this exercise is a horizontally-scalable, cooperating system of web server / cache nodes, fronted by a customized webDAV server. The webDAV server is custom only in the sense that it supports HTTP redirects (providing horizontal scaling) and that the authentication module has, as back end, a proxy delegation chain that can be used by the cache nodes to retrieve files from the grid. The prototype was deployed at Nikhef and tested at a scale of several terabytes of data and approximately one hundred fast cores of computing. Both small and large files were tested, in a number of scenarios, and with various numbers of cache nodes, in order to understand the scaling properties of the system. For properly-dimensioned cache-node hardware, the system showed speedup of several integer factors for the analysis-type use cases. These results and others are presented and discussed in this contribution

    Job Failure Analysis and Its Implications in a Large-Scale Production Grid

    No full text
    In this paper we present an initial analysis of job failures in a large-scale data-intensive Grid. Based on three representative periods in production, we characterize the interarrival times and life spans of failed jobs. Different failure types are distinguished and the analysis is carried out further at the Virtual Organization (VO) level. The spatial behavior, namely where job failures occur in the Grid, is also examined. Cross-correlation structures, including how arrivals correlate with life spans of job failures, are analyzed and illustrated. We further investigate statistical models to fit the failure data and propose several failureaware scheduling strategies at the Grid level. Our results show that the overall failure rates in the Grid are quite significant, ranging from 25 % to 33 % of all submitted jobs. However, only 5 % to 8 % of the jobs failed after running on a certain Computing Element (CE). The rest of failed jobs are aborted or cancelled without running. A majority of failed jobs come from several large production VOs and a large amount of these failures are centered around several main CEs. The interarrival time processes of failed jobs are shown to be bursty, and the life spans exhibit strong autocorrelations. Based on the failure patterns we argue that it is important for the Grid resource brokers to track historical failure and take it into account in decision making. Some proactive measures and accountability issues are also discussed.

    Summary

    No full text
    In recent years, Grids have been used extensively for a wide range of scientific purposes making this infrastructure gradually a more accepted alternative to conventional cluster computing. However, even today the reliability of computational sites in Grids tends to leave a lot to be desired. We argue that a priori estimation of site reliability will give users the opportunity to decrease the likelihood of job failure, thereby reducing excess consumption of grid resources and improving the user experience of utilizing the Grid. This work focuses on an in-depth analysis of user-perceived reliability on grid infrastructures and proposes a usercentric approach to reduce the number of failures on grids. We proceed by proposing a platform-independent design for a site reliability prediction tool and a description for a platform-specific implementation. In conclusion, we provide validation experiments to give an impression of the potential added value of using a reliability prediction tool. 3

    The experiment dashboard for medical applications

    No full text
    The Experiment Dashboard is a monitoring system initially developed for the LHC experiments to provide the view of the Grid infrastructure from the perspective of the LHC virtual organization. The poster describes the first experience of the deployment and usage of the system outside the LHC community, for monitoring of medical applications on the Grid. Functional magnetic resonance imaging (fMRI) is a popular tool used in neuroscience research to study brain function. The Virtual Lab for fMRI (VL-fMRI) is developed as one of the activities of the ``Medical Diagnosis and Imaging'' subprogram of the Virtual Laboratory for e-Sciences Project. VL-fMRI has taken steps to enable data management and analysis tasks for fMRI studies on the Grid infrastructure. Since spring 2006 the Experiment Dashboard is used for job processing monitoring of the VL-fMRI activities. The Experiment Dashboard provides an easy way to users to follow their jobs on the distributed infrastructure. Furthermore, the system allows to detect problems or inefficiencies of Grid sites or services and to understand the underlying problem. This functionality is important for site administrators and VO support teams. fMRI studies are data intensive, since large amounts of data are stored, analyzed and manipulated. They require high throughput computation on demand for real-time image analysis and for large-scale studies. Collaboration and distributed computing are essential, in particular for multi-center studies, where data is distributed. Using the Grid infrastructure is a natural choice in order to satisfy the requirements mentioned above. On the other hand the fMRI users (in particular psychologists, psychiatrists, radiologists, etc.) typically have limited background in computing and therefore need a user-friendly environment, which would enable the preparation, submission and monitoring of their jobs on the Grid. The Experiment Dashboard is providing the job monitoring functionality for the fMRI users and VO supporters. The first experience of using the Experiment Dashboard by the VL-fMRI community was positive. It was proven that the system, initially developed for the High Energy Physics community, is flexible enough and provides the necessary functionality to be easily adapted to the needs of users of completely different fields

    HEP Applications and Their Experience with the Use of DataGrid Middleware

    No full text
    An overview is presented of the characteristics of HEP computing and its mapping to the Grid paradigm. This is followed by a synopsis of the main experiences and lessons learned by HEP experiments in their use of DataGrid middleware using both the EDG application testbed and the LCG production service. Particular reference is made to experiment lsquodata challengesrsquo, and a forward look is given to necessary developments in the framework of the EGEE projec

    A Roadmap for HEP Software and Computing R&D for the 2020s

    No full text
    Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade
    corecore